10 research outputs found
Policy search with high-dimensional context variables
Direct contextual policy search methods learn to improve policy
parameters and simultaneously generalize these parameters
to different context or task variables. However, learning
from high-dimensional context variables, such as camera images,
is still a prominent problem in many real-world tasks.
A naive application of unsupervised dimensionality reduction
methods to the context variables, such as principal component
analysis, is insufficient as task-relevant input may be ignored.
In this paper, we propose a contextual policy search method in
the model-based relative entropy stochastic search framework
with integrated dimensionality reduction. We learn a model of
the reward that is locally quadratic in both the policy parameters
and the context variables. Furthermore, we perform supervised
linear dimensionality reduction on the context variables
by nuclear norm regularization. The experimental results
show that the proposed method outperforms naive dimensionality
reduction via principal component analysis and
a state-of-the-art contextual policy search method
TD-regularized actor-critic methods
Actor-critic methods can achieve incredible performance on difficult
reinforcement learning problems, but they are also prone to instability. This
is partly due to the interaction between the actor and critic during learning,
e.g., an inaccurate step taken by one of them might adversely affect the other
and destabilize the learning. To avoid such issues, we propose to regularize
the learning objective of the actor by penalizing the temporal difference (TD)
error of the critic. This improves stability by avoiding large steps in the
actor update whenever the critic is highly inaccurate. The resulting method,
which we call the TD-regularized actor-critic method, is a simple plug-and-play
approach to improve stability and overall performance of the actor-critic
methods. Evaluations on standard benchmarks confirm this